4,133 research outputs found
Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG
is a multi-relational graph that has proven valuable for many tasks including
question answering and semantic search. In this paper, we present GENI, a
method for tackling the problem of estimating node importance in KGs, which
enables several downstream applications such as item recommendation and
resource allocation. While a number of approaches have been developed to
address this problem for general graphs, they do not fully utilize information
available in KGs, or lack flexibility needed to model complex relationship
between entities and their importance. To address these limitations, we explore
supervised machine learning algorithms. In particular, building upon recent
advancement of graph neural networks (GNNs), we develop GENI, a GNN-based
method designed to deal with distinctive challenges involved with predicting
node importance in KGs. Our method performs an aggregation of importance scores
instead of aggregating node embeddings via predicate-aware attention mechanism
and flexible centrality adjustment. In our evaluation of GENI and existing
methods on predicting node importance in real-world KGs with different
characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.Comment: KDD 2019 Research Track. 11 pages. Changelog: Type 3 font removed,
and minor updates made in the Appendix (v2
Robust Group Linkage
We study the problem of group linkage: linking records that refer to entities
in the same group. Applications for group linkage include finding businesses in
the same chain, finding conference attendees from the same affiliation, finding
players from the same team, etc. Group linkage faces challenges not present for
traditional record linkage. First, although different members in the same group
can share some similar global values of an attribute, they represent different
entities so can also have distinct local values for the same or different
attributes, requiring a high tolerance for value diversity. Second, groups can
be huge (with tens of thousands of records), requiring high scalability even
after using good blocking strategies.
We present a two-stage algorithm: the first stage identifies cores containing
records that are very likely to belong to the same group, while being robust to
possible erroneous values; the second stage collects strong evidence from the
cores and leverages it for merging more records into the same group, while
being tolerant to differences in local values of an attribute. Experimental
results show the high effectiveness and efficiency of our algorithm on various
real-world data sets
- …